Functional Linear Regression - Attempts to improve the model fit


Recall that the residuals in the first model attempt were rather structured, indicating a major issue in the model of unknown source. In order to address for this structure and to improve the general model fit in the process, several model modifications are proposed. These include the transformation of the target functions, adding more PCs of the climate variables, adding non-linear effects and checking for possible interactions.

As a reminder, the summary output of the initial model is given here:


## Response PC1 :
## 
## Call:
## lm(formula = PC1 ~ Lon + Lat + sand_fraction + silt_fraction + 
##     bulkdensity_soil + ph_soil + soilcarbon + time_since_dist + 
##     Scenario + initial_recruitment_BNE + initial_recruitment_IBS + 
##     initial_recruitment_TeBS + initial_recruitment_Tundra + initial_recruitment_otherC + 
##     recruitment_ten_years_BNE + recruitment_ten_years_IBS + recruitment_ten_years_TeBS + 
##     recruitment_ten_years_Tundra + recruitment_ten_years_otherC + 
##     previous_state_BNE + previous_state_IBS + previous_state_TeBS + 
##     previous_state_Tundra + previous_state_otherC + PC1_temp + 
##     PC1_precip, data = d_train %>% select(-c(clay_fraction)))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.6863  -1.9650  -0.0036   1.9651   7.4673 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  -2.199e+01  1.962e+00 -11.209  < 2e-16 ***
## Lon                          -6.785e-03  8.459e-04  -8.021 2.19e-15 ***
## Lat                           2.605e-01  1.631e-02  15.979  < 2e-16 ***
## sand_fraction                 2.192e+01  1.223e+00  17.913  < 2e-16 ***
## silt_fraction                 1.410e+01  2.113e+00   6.671 3.64e-11 ***
## bulkdensity_soil             -3.708e+00  7.292e-01  -5.085 4.17e-07 ***
## ph_soil                      -3.084e-01  1.710e-01  -1.804  0.07149 .  
## soilcarbon                   -4.043e-01  6.431e-02  -6.287 4.30e-10 ***
## time_since_dist               4.360e-04  5.782e-04   0.754  0.45098    
## ScenarioSSP1-RCP2.6          -1.794e+00  2.126e-01  -8.439  < 2e-16 ***
## ScenarioSSP3-RCP7.0          -1.309e+00  2.409e-01  -5.431 6.60e-08 ***
## ScenarioSSP5-RCP8.5          -1.343e+00  2.590e-01  -5.184 2.49e-07 ***
## initial_recruitment_BNE       1.084e-02  2.404e-03   4.507 7.12e-06 ***
## initial_recruitment_IBS       2.791e-03  4.060e-03   0.687  0.49192    
## initial_recruitment_TeBS      4.896e-02  1.832e-02   2.673  0.00761 ** 
## initial_recruitment_Tundra   -9.699e-03  3.652e-03  -2.656  0.00801 ** 
## initial_recruitment_otherC   -9.609e-03  4.702e-03  -2.044  0.04116 *  
## recruitment_ten_years_BNE    -3.912e-04  1.731e-04  -2.260  0.02395 *  
## recruitment_ten_years_IBS     1.260e-03  6.809e-04   1.850  0.06455 .  
## recruitment_ten_years_TeBS   -6.401e-03  3.263e-03  -1.961  0.05003 .  
## recruitment_ten_years_Tundra  3.391e-04  2.358e-04   1.438  0.15068    
## recruitment_ten_years_otherC  6.490e-04  1.414e-03   0.459  0.64629    
## previous_state_BNE            4.760e-03  2.538e-02   0.188  0.85125    
## previous_state_IBS           -7.137e-03  1.818e-02  -0.393  0.69465    
## previous_state_TeBS           1.657e-01  1.141e-01   1.452  0.14666    
## previous_state_Tundra         5.411e-03  1.529e-01   0.035  0.97179    
## previous_state_otherC        -1.832e-01  3.789e-02  -4.834 1.49e-06 ***
## PC1_temp                      6.788e-02  3.771e-03  18.002  < 2e-16 ***
## PC1_precip                   -6.060e-04  3.770e-05 -16.073  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.632 on 1414 degrees of freedom
## Multiple R-squared:  0.6317, Adjusted R-squared:  0.6244 
## F-statistic: 86.61 on 28 and 1414 DF,  p-value: < 2.2e-16
## 
## 
## Response PC2 :
## 
## Call:
## lm(formula = PC2 ~ Lon + Lat + sand_fraction + silt_fraction + 
##     bulkdensity_soil + ph_soil + soilcarbon + time_since_dist + 
##     Scenario + initial_recruitment_BNE + initial_recruitment_IBS + 
##     initial_recruitment_TeBS + initial_recruitment_Tundra + initial_recruitment_otherC + 
##     recruitment_ten_years_BNE + recruitment_ten_years_IBS + recruitment_ten_years_TeBS + 
##     recruitment_ten_years_Tundra + recruitment_ten_years_otherC + 
##     previous_state_BNE + previous_state_IBS + previous_state_TeBS + 
##     previous_state_Tundra + previous_state_otherC + PC1_temp + 
##     PC1_precip, data = d_train %>% select(-c(clay_fraction)))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.1278 -0.2266  0.2995  0.7167  8.0355 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   2.101e+00  1.093e+00   1.923 0.054667 .  
## Lon                          -4.360e-05  4.711e-04  -0.093 0.926271    
## Lat                          -3.390e-02  9.081e-03  -3.733 0.000197 ***
## sand_fraction                -3.266e-01  6.814e-01  -0.479 0.631762    
## silt_fraction                -2.080e+00  1.177e+00  -1.767 0.077475 .  
## bulkdensity_soil              3.048e-01  4.061e-01   0.750 0.453082    
## ph_soil                       3.194e-02  9.523e-02   0.335 0.737422    
## soilcarbon                    3.125e-02  3.582e-02   0.873 0.383077    
## time_since_dist              -2.938e-04  3.221e-04  -0.912 0.361716    
## ScenarioSSP1-RCP2.6           4.812e-01  1.184e-01   4.064 5.08e-05 ***
## ScenarioSSP3-RCP7.0           7.720e-01  1.342e-01   5.753 1.07e-08 ***
## ScenarioSSP5-RCP8.5           7.206e-01  1.443e-01   4.995 6.63e-07 ***
## initial_recruitment_BNE       3.832e-04  1.339e-03   0.286 0.774810    
## initial_recruitment_IBS      -8.366e-03  2.261e-03  -3.699 0.000224 ***
## initial_recruitment_TeBS     -1.302e-01  1.020e-02 -12.762  < 2e-16 ***
## initial_recruitment_Tundra    2.018e-03  2.034e-03   0.992 0.321272    
## initial_recruitment_otherC   -3.382e-03  2.619e-03  -1.291 0.196776    
## recruitment_ten_years_BNE     7.787e-05  9.638e-05   0.808 0.419234    
## recruitment_ten_years_IBS     1.375e-03  3.792e-04   3.625 0.000299 ***
## recruitment_ten_years_TeBS    1.821e-02  1.817e-03  10.018  < 2e-16 ***
## recruitment_ten_years_Tundra -2.333e-04  1.313e-04  -1.776 0.075869 .  
## recruitment_ten_years_otherC  1.093e-04  7.875e-04   0.139 0.889650    
## previous_state_BNE           -3.092e-02  1.414e-02  -2.187 0.028886 *  
## previous_state_IBS           -3.948e-03  1.012e-02  -0.390 0.696571    
## previous_state_TeBS          -3.350e-01  6.354e-02  -5.272 1.56e-07 ***
## previous_state_Tundra        -2.054e-01  8.518e-02  -2.412 0.016003 *  
## previous_state_otherC        -9.795e-02  2.110e-02  -4.642 3.78e-06 ***
## PC1_temp                      2.264e-02  2.100e-03  10.781  < 2e-16 ***
## PC1_precip                    3.523e-05  2.100e-05   1.678 0.093629 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.466 on 1414 degrees of freedom
## Multiple R-squared:  0.3486, Adjusted R-squared:  0.3357 
## F-statistic: 27.03 on 28 and 1414 DF,  p-value: < 2.2e-16

Figure 1 shows the residuals and Q-Q plots from last week:


Figure 1: Model residuals and Q-Q plots for both predicted PCs (on train data set).
Figure 1: Model residuals and Q-Q plots for both predicted PCs (on train data set).

Figure 2 shows the true PC scores vs. the predicted PC scores for the test set.


Figure 2: True vs. fitted PC scores for unseen data.
Figure 2: True vs. fitted PC scores for unseen data.

Figure 3 shows for four randomly picked disturbed grid cells, one for each scenario, the true functional fit and the predicted one from the model.


Figure 3: True functional fit and predicted functional fit for four different example grid cells, one for each scenario.
Figure 3: True functional fit and predicted functional fit for four different example grid cells, one for each scenario.

1. Removing covariates

As a first step to improve the model and to address the structure in the residuals, several configurations of covariates are tested, here as linear effects. None of these attempts improved the model substantially. It became clear that no matter which variables are in the model, the structure in the residuals is present. This leads to the conclusion that not one (or multiple) covariates are responsible, but rather the target variable or the model itself.

A common technique in variable selection is a stepwise approach based on AIC. In particular this means that for each considered variable configuration, the AIC is derived by

AIC \(= 2k - 2\ln{L} = 2k + n \left (r\ln{2\pi} + \ln{|\Sigma|} + r\right )\),

where \(n\) is the number of observations, \(k\) is the number of parameters in the model, and \(r\) is the number of response variables. With that, adding and removing covariates is assessed in a stepwise manner.

The resulting model improves the AIC only marginally (from 12103 to 12099) and removes variable recruitment_ten_years_otherC. Removing this covariate only is not an option though, since eather all five recruitment_ten_years variables should be included, or none of the five. In addition, removing this covariate does not account for the structured residuals.

Note that in this approach, only linear effects and no interactions are considered.


2. Data transformation: Transform the PFT shares

As a second attempt, the shares of PFT are transformed. Recall that in the data pre-processing, the absolute amounts of aboveground carbon (cmass) were transformed to portions relative to all five provided PFTs, i.e., all values are in the interval \([0,1]\) and these values are the input for the MFPCA in the subsequent step. Now, two different approaches based on transformed values are explored: first, we take a look at transforming the probabilities, denoted as \(\mathbf{p} = (p_\text{BNE}, p_\text{IBS}, p_\text{otherC}, p_\text{TeBS}, p_\text{Tundra}) = (p_1,p_2,p_3,p_4,p_5)\) with the softmax function. Second, we fit the model using the initial absolute values of cmass.

2.1 Transformation via Softmax Function

The softmax function is given as
\(p_i = \frac{e^{x_i}}{\sum_{j=1}^5 e^{x_j}}\), \(i=1,...,5\).

Rearranging this equation to isolate \(x_i\) yields

\(x_i = \log(p_i) + \log\left(\sum_{j=1}^5 e^{x_j}\right) =: \log(p_i) + C\), \(i = 1,...,5\).

Since every \(x_i\) depends on all the \(x_j\), \(j=1,...5\), solving for \(\mathbf{x} = (x_1,x_2,x_3,x_4,x_5)\) requires constraints on the calculation. Two possible constraints are examined in more detail in the following:

  • Constrain \(C\) to be zero, i.e.:
    \(\sum_{j=1}^5 e^{x_j} = 0 \quad \rightarrow \quad x_i = \log(p_i), \quad i = 1, ..., 5\).
  • Choose a reference category (e.g., Tundra):
    \(x_i = \log\left(\frac{p_i}{p_5}\right), \quad i = 1,..., 4, \quad x_5 = 0.\)
The second approach is not valid in this case. Choosing one reference category implies setting one of the five data matrices of dimension \(n_\text{locations} \times 100\) equal to 0, making it impossible to derive a proper eigendecomposition of the data. That is, a MFPCA cannot longer be performed. Consequently, this analysis focuses on the first approach. Since some of the shares are equal to 0, the transformation is set to
\(x_i = \log(1+p_i), \quad i = 1, ..., 5\)

in order to avoid computation errors.

Another possibility would be to apply a Newton-Raphson approximation to some initial (guessed) values of \(\mathbf{x}\). This approach is not tested yet.

The model output for the log-transformed possibilities is given here:


## Response PC1 :
## 
## Call:
## lm(formula = PC1 ~ Lon + Lat + sand_fraction + silt_fraction + 
##     bulkdensity_soil + ph_soil + soilcarbon + time_since_dist + 
##     Scenario + initial_recruitment_BNE + initial_recruitment_IBS + 
##     initial_recruitment_TeBS + initial_recruitment_Tundra + initial_recruitment_otherC + 
##     recruitment_ten_years_BNE + recruitment_ten_years_IBS + recruitment_ten_years_TeBS + 
##     recruitment_ten_years_Tundra + recruitment_ten_years_otherC + 
##     previous_state_BNE + previous_state_IBS + previous_state_TeBS + 
##     previous_state_Tundra + previous_state_otherC + PC1_temp + 
##     PC1_precip, data = d_train %>% select(-clay_fraction))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.5428 -1.3720 -0.0238  1.4115  5.4572 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  -1.585e+01  1.422e+00 -11.152  < 2e-16 ***
## Lon                          -4.930e-03  6.130e-04  -8.043 1.83e-15 ***
## Lat                           1.863e-01  1.182e-02  15.772  < 2e-16 ***
## sand_fraction                 1.603e+01  8.866e-01  18.083  < 2e-16 ***
## silt_fraction                 1.025e+01  1.531e+00   6.695 3.11e-11 ***
## bulkdensity_soil             -2.732e+00  5.284e-01  -5.170 2.67e-07 ***
## ph_soil                      -2.074e-01  1.239e-01  -1.674   0.0944 .  
## soilcarbon                   -2.967e-01  4.661e-02  -6.366 2.62e-10 ***
## time_since_dist               2.984e-04  4.190e-04   0.712   0.4765    
## ScenarioSSP1-RCP2.6          -1.271e+00  1.541e-01  -8.251 3.57e-16 ***
## ScenarioSSP3-RCP7.0          -9.066e-01  1.746e-01  -5.192 2.38e-07 ***
## ScenarioSSP5-RCP8.5          -9.139e-01  1.877e-01  -4.869 1.25e-06 ***
## initial_recruitment_BNE       8.064e-03  1.742e-03   4.629 4.02e-06 ***
## initial_recruitment_IBS       1.186e-03  2.942e-03   0.403   0.6871    
## initial_recruitment_TeBS      2.669e-02  1.328e-02   2.010   0.0446 *  
## initial_recruitment_Tundra   -7.673e-03  2.647e-03  -2.899   0.0038 ** 
## initial_recruitment_otherC   -6.668e-03  3.407e-03  -1.957   0.0505 .  
## recruitment_ten_years_BNE    -2.705e-04  1.254e-04  -2.157   0.0312 *  
## recruitment_ten_years_IBS     1.175e-03  4.934e-04   2.381   0.0174 *  
## recruitment_ten_years_TeBS   -3.328e-03  2.365e-03  -1.408   0.1595    
## recruitment_ten_years_Tundra  2.802e-04  1.709e-04   1.640   0.1013    
## recruitment_ten_years_otherC  3.132e-04  1.025e-03   0.306   0.7599    
## previous_state_BNE           -3.291e-03  1.839e-02  -0.179   0.8580    
## previous_state_IBS           -9.453e-03  1.317e-02  -0.718   0.4731    
## previous_state_TeBS           9.664e-02  8.267e-02   1.169   0.2426    
## previous_state_Tundra        -7.268e-02  1.108e-01  -0.656   0.5121    
## previous_state_otherC        -1.458e-01  2.746e-02  -5.311 1.26e-07 ***
## PC1_temp                      5.042e-02  2.732e-03  18.454  < 2e-16 ***
## PC1_precip                   -4.357e-04  2.732e-05 -15.946  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.907 on 1414 degrees of freedom
## Multiple R-squared:  0.6385, Adjusted R-squared:  0.6313 
## F-statistic: 89.19 on 28 and 1414 DF,  p-value: < 2.2e-16
## 
## 
## Response PC2 :
## 
## Call:
## lm(formula = PC2 ~ Lon + Lat + sand_fraction + silt_fraction + 
##     bulkdensity_soil + ph_soil + soilcarbon + time_since_dist + 
##     Scenario + initial_recruitment_BNE + initial_recruitment_IBS + 
##     initial_recruitment_TeBS + initial_recruitment_Tundra + initial_recruitment_otherC + 
##     recruitment_ten_years_BNE + recruitment_ten_years_IBS + recruitment_ten_years_TeBS + 
##     recruitment_ten_years_Tundra + recruitment_ten_years_otherC + 
##     previous_state_BNE + previous_state_IBS + previous_state_TeBS + 
##     previous_state_Tundra + previous_state_otherC + PC1_temp + 
##     PC1_precip, data = d_train %>% select(-clay_fraction))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.9324 -0.1538  0.2059  0.5161  5.9487 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   2.284e+00  7.926e-01   2.882 0.004017 ** 
## Lon                          -2.144e-05  3.418e-04  -0.063 0.949988    
## Lat                          -3.114e-02  6.588e-03  -4.726 2.52e-06 ***
## sand_fraction                -6.612e-01  4.943e-01  -1.338 0.181267    
## silt_fraction                -1.586e+00  8.538e-01  -1.858 0.063375 .  
## bulkdensity_soil              1.842e-01  2.946e-01   0.625 0.531945    
## ph_soil                       1.930e-02  6.909e-02   0.279 0.780026    
## soilcarbon                    2.027e-02  2.598e-02   0.780 0.435541    
## time_since_dist              -2.541e-04  2.336e-04  -1.088 0.276887    
## ScenarioSSP1-RCP2.6           2.582e-01  8.589e-02   3.007 0.002689 ** 
## ScenarioSSP3-RCP7.0           4.762e-01  9.735e-02   4.892 1.11e-06 ***
## ScenarioSSP5-RCP8.5           4.337e-01  1.047e-01   4.144 3.61e-05 ***
## initial_recruitment_BNE       4.683e-04  9.714e-04   0.482 0.629840    
## initial_recruitment_IBS      -6.660e-03  1.640e-03  -4.060 5.18e-05 ***
## initial_recruitment_TeBS     -9.396e-02  7.402e-03 -12.694  < 2e-16 ***
## initial_recruitment_Tundra    1.824e-03  1.476e-03   1.236 0.216723    
## initial_recruitment_otherC   -2.686e-03  1.900e-03  -1.414 0.157518    
## recruitment_ten_years_BNE     3.944e-05  6.992e-05   0.564 0.572727    
## recruitment_ten_years_IBS     1.022e-03  2.751e-04   3.715 0.000212 ***
## recruitment_ten_years_TeBS    1.301e-02  1.318e-03   9.870  < 2e-16 ***
## recruitment_ten_years_Tundra -1.732e-04  9.527e-05  -1.818 0.069310 .  
## recruitment_ten_years_otherC  1.323e-04  5.713e-04   0.232 0.816835    
## previous_state_BNE           -2.246e-02  1.025e-02  -2.191 0.028650 *  
## previous_state_IBS           -6.764e-03  7.344e-03  -0.921 0.357185    
## previous_state_TeBS          -2.468e-01  4.610e-02  -5.354 1.00e-07 ***
## previous_state_Tundra        -1.539e-01  6.180e-02  -2.490 0.012875 *  
## previous_state_otherC        -6.290e-02  1.531e-02  -4.109 4.21e-05 ***
## PC1_temp                      1.576e-02  1.523e-03  10.343  < 2e-16 ***
## PC1_precip                    3.811e-05  1.523e-05   2.502 0.012479 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.063 on 1414 degrees of freedom
## Multiple R-squared:  0.3558, Adjusted R-squared:  0.343 
## F-statistic: 27.89 on 28 and 1414 DF,  p-value: < 2.2e-16

Unfortunately, this transformation does not solve the initial problem of structured residuals. Looking at Figure 4 still reveals some structure:


Figure 4: Model residuals and Q-Q plots for both predicted PCs for log-transformed probabilities (on train data set).
Figure 4: Model residuals and Q-Q plots for both predicted PCs for log-transformed probabilities (on train data set).

Interestingly, the plot only deviates slightly from the residuals for the non-transformed PFT-shares (Figure 1). But it should be noted that the scale of the residuals decreases.


Figure 5: True vs. fitted PC scores for unseen data with log-transformed PFT-shares.
Figure 5: True vs. fitted PC scores for unseen data with log-transformed PFT-shares.

Finally, Figure 6 shows the resulting functional fit for four randomly chosen disturbed grid cells:


Figure 6: True functional fit and predicted functional fit for four different example grid cells, one for each scenario, and log-transformed PFT-shares.
Figure 6: True functional fit and predicted functional fit for four different example grid cells, one for each scenario, and log-transformed PFT-shares.

Also here the differences are rather small.

Questions here: This was to be expected, right? The log-transformation only does not solve the problem. How else could \(\mathbf{p}\) be transformed? Newton-Raphson?


2.2 Taking absolute values of aboveground carbon instead of portions

Second, the absolute values of aboveground carbon are used for MFPCA and model fitting. First, let’s take a look at the model summary:


## Response PC1 :
## 
## Call:
## lm(formula = PC1 ~ Lon + Lat + sand_fraction + silt_fraction + 
##     bulkdensity_soil + ph_soil + soilcarbon + time_since_dist + 
##     Scenario + initial_recruitment_BNE + initial_recruitment_IBS + 
##     initial_recruitment_TeBS + initial_recruitment_Tundra + initial_recruitment_otherC + 
##     recruitment_ten_years_BNE + recruitment_ten_years_IBS + recruitment_ten_years_TeBS + 
##     recruitment_ten_years_Tundra + recruitment_ten_years_otherC + 
##     previous_state_BNE + previous_state_IBS + previous_state_TeBS + 
##     previous_state_Tundra + previous_state_otherC + PC1_temp + 
##     PC1_precip, data = d_train %>% select(-clay_fraction))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -89.870 -16.757   1.503  18.760 102.561 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   1.730e+02  1.941e+01   8.916  < 2e-16 ***
## Lon                           3.925e-02  8.368e-03   4.690 2.99e-06 ***
## Lat                          -2.622e+00  1.613e-01 -16.256  < 2e-16 ***
## sand_fraction                -1.564e+02  1.210e+01 -12.925  < 2e-16 ***
## silt_fraction                -9.865e+01  2.091e+01  -4.719 2.61e-06 ***
## bulkdensity_soil              2.919e+01  7.214e+00   4.047 5.48e-05 ***
## ph_soil                       3.749e+00  1.692e+00   2.216 0.026846 *  
## soilcarbon                    3.299e+00  6.363e-01   5.185 2.47e-07 ***
## time_since_dist              -3.156e-03  5.721e-03  -0.552 0.581283    
## ScenarioSSP1-RCP2.6           2.329e+01  2.103e+00  11.076  < 2e-16 ***
## ScenarioSSP3-RCP7.0           2.617e+01  2.384e+00  10.980  < 2e-16 ***
## ScenarioSSP5-RCP8.5           2.784e+01  2.563e+00  10.863  < 2e-16 ***
## initial_recruitment_BNE      -1.089e-01  2.378e-02  -4.578 5.11e-06 ***
## initial_recruitment_IBS      -4.275e-03  4.017e-02  -0.106 0.915254    
## initial_recruitment_TeBS     -9.621e-01  1.812e-01  -5.309 1.28e-07 ***
## initial_recruitment_Tundra    1.480e-01  3.613e-02   4.095 4.47e-05 ***
## initial_recruitment_otherC    4.895e-02  4.651e-02   1.052 0.292761    
## recruitment_ten_years_BNE     3.271e-03  1.712e-03   1.911 0.056214 .  
## recruitment_ten_years_IBS    -1.911e-02  6.736e-03  -2.837 0.004620 ** 
## recruitment_ten_years_TeBS    1.087e-01  3.228e-02   3.366 0.000783 ***
## recruitment_ten_years_Tundra -7.378e-03  2.333e-03  -3.163 0.001597 ** 
## recruitment_ten_years_otherC  2.255e-03  1.399e-02   0.161 0.871942    
## previous_state_BNE            1.819e-01  2.511e-01   0.725 0.468848    
## previous_state_IBS            5.788e-01  1.798e-01   3.219 0.001316 ** 
## previous_state_TeBS          -1.915e+00  1.129e+00  -1.697 0.089948 .  
## previous_state_Tundra         2.825e+00  1.513e+00   1.867 0.062133 .  
## previous_state_otherC         1.690e+00  3.748e-01   4.507 7.11e-06 ***
## PC1_temp                     -4.755e-01  3.730e-02 -12.749  < 2e-16 ***
## PC1_precip                    5.621e-03  3.730e-04  15.069  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 26.03 on 1414 degrees of freedom
## Multiple R-squared:  0.6045, Adjusted R-squared:  0.5966 
## F-statistic: 77.18 on 28 and 1414 DF,  p-value: < 2.2e-16
## 
## 
## Response PC2 :
## 
## Call:
## lm(formula = PC2 ~ Lon + Lat + sand_fraction + silt_fraction + 
##     bulkdensity_soil + ph_soil + soilcarbon + time_since_dist + 
##     Scenario + initial_recruitment_BNE + initial_recruitment_IBS + 
##     initial_recruitment_TeBS + initial_recruitment_Tundra + initial_recruitment_otherC + 
##     recruitment_ten_years_BNE + recruitment_ten_years_IBS + recruitment_ten_years_TeBS + 
##     recruitment_ten_years_Tundra + recruitment_ten_years_otherC + 
##     previous_state_BNE + previous_state_IBS + previous_state_TeBS + 
##     previous_state_Tundra + previous_state_otherC + PC1_temp + 
##     PC1_precip, data = d_train %>% select(-clay_fraction))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -73.556  -5.770  -1.736   2.390  75.515 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  -7.356e+00  9.656e+00  -0.762  0.44630    
## Lon                           8.891e-03  4.164e-03   2.135  0.03290 *  
## Lat                           1.649e-01  8.026e-02   2.055  0.04005 *  
## sand_fraction                -1.291e+01  6.022e+00  -2.143  0.03226 *  
## silt_fraction                 3.140e+00  1.040e+01   0.302  0.76280    
## bulkdensity_soil              6.296e+00  3.589e+00   1.754  0.07964 .  
## ph_soil                      -1.274e+00  8.417e-01  -1.514  0.13036    
## soilcarbon                    5.147e-01  3.166e-01   1.626  0.10422    
## time_since_dist               2.818e-03  2.846e-03   0.990  0.32232    
## ScenarioSSP1-RCP2.6          -1.515e+00  1.046e+00  -1.448  0.14776    
## ScenarioSSP3-RCP7.0          -3.832e+00  1.186e+00  -3.231  0.00126 ** 
## ScenarioSSP5-RCP8.5          -3.423e+00  1.275e+00  -2.685  0.00734 ** 
## initial_recruitment_BNE      -2.804e-02  1.183e-02  -2.369  0.01797 *  
## initial_recruitment_IBS       8.411e-02  1.999e-02   4.208 2.73e-05 ***
## initial_recruitment_TeBS      9.522e-01  9.018e-02  10.559  < 2e-16 ***
## initial_recruitment_Tundra   -1.512e-02  1.798e-02  -0.841  0.40035    
## initial_recruitment_otherC    3.565e-02  2.314e-02   1.541  0.12365    
## recruitment_ten_years_BNE     2.148e-04  8.518e-04   0.252  0.80097    
## recruitment_ten_years_IBS    -1.100e-02  3.352e-03  -3.283  0.00105 ** 
## recruitment_ten_years_TeBS   -1.194e-01  1.606e-02  -7.435 1.81e-13 ***
## recruitment_ten_years_Tundra  6.238e-04  1.161e-03   0.537  0.59106    
## recruitment_ten_years_otherC  4.661e-03  6.960e-03   0.670  0.50319    
## previous_state_BNE            2.549e-01  1.249e-01   2.041  0.04147 *  
## previous_state_IBS            1.074e-01  8.947e-02   1.200  0.23026    
## previous_state_TeBS           2.449e+00  5.616e-01   4.360 1.39e-05 ***
## previous_state_Tundra         1.534e+00  7.528e-01   2.037  0.04183 *  
## previous_state_otherC         8.745e-01  1.865e-01   4.689 3.02e-06 ***
## PC1_temp                     -2.278e-01  1.856e-02 -12.275  < 2e-16 ***
## PC1_precip                   -1.866e-05  1.856e-04  -0.101  0.91992    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.95 on 1414 degrees of freedom
## Multiple R-squared:  0.3731, Adjusted R-squared:  0.3607 
## F-statistic: 30.06 on 28 and 1414 DF,  p-value: < 2.2e-16

Again, this modification does not improve the initial residual problem, as Figure 7 indicates:


Figure 7: Model residuals and Q-Q plots for both predicted PCs for absolute cmass values (on train data set).
Figure 7: Model residuals and Q-Q plots for both predicted PCs for absolute cmass values (on train data set).

Now, the residuals clearly follow a different structure than before, but are still bounded. Figure 8 shows again the accuracy of the predictions on unseen data:


Figure 8: True vs. fitted PC scores for unseen data for absolute cmass values.
Figure 8: True vs. fitted PC scores for unseen data for absolute cmass values.

Now, let’s take a look at functional predictions. Since the targets are total values of aboveground carbon, the general curve behavior of course deviates from the usual one.


Figure 9: True functional fit and predicted functional fit for four different example grid cells, one for each scenario, and absolut cmass values.
Figure 9: True functional fit and predicted functional fit for four different example grid cells, one for each scenario, and absolut cmass values.

In order to compare this fit to the fits from the two models above, the predicted cmass values are transformed into proportions. One issue arises when choosing this approach: when transforming the predicted cmass values to shares, there appear some negative cmass values. In this approach, all negative shares are set to 0 which of course induces additional bias. This procedure was applied when deriving Figure 10:


Figure 10: True functional fit and predicted functional fit for four different example grid cells, one for each scenario, and absolut cmass values transformed to proportions.
Figure 10: True functional fit and predicted functional fit for four different example grid cells, one for each scenario, and absolut cmass values transformed to proportions.

For these example grid cells, the fit seems more appropriate than for the first two models.


3. Include more PCs for both temperature and precipitation

Several numbers of PCs were checked with no change to the structural residuals. Therefore, the model outputs and corresponding plots are not reported here.

4. Check for non-linear effects: Fitting an additive model

Several combinations of smooth effects were tested. For variables Lon and PC1_temp it might make sense to include additive effects. However, this does not change the structural behavior. The output of the model is given in the following:


## 
## Family: Multivariate normal 
## Link function: 
## 
## Formula:
## PC1 ~ s(Lon) + Lat + sand_fraction + silt_fraction + bulkdensity_soil + 
##     ph_soil + soilcarbon + Scenario + time_since_dist + initial_recruitment_BNE + 
##     initial_recruitment_IBS + initial_recruitment_otherC + initial_recruitment_TeBS + 
##     initial_recruitment_Tundra + recruitment_ten_years_BNE + 
##     recruitment_ten_years_IBS + recruitment_ten_years_otherC + 
##     recruitment_ten_years_TeBS + recruitment_ten_years_Tundra + 
##     previous_state_BNE + previous_state_IBS + previous_state_otherC + 
##     previous_state_TeBS + previous_state_Tundra + s(PC1_temp) + 
##     PC1_precip + PC1_temp:PC1_precip
## PC2 ~ s(Lon) + Lat + sand_fraction + silt_fraction + bulkdensity_soil + 
##     ph_soil + soilcarbon + Scenario + time_since_dist + initial_recruitment_BNE + 
##     initial_recruitment_IBS + initial_recruitment_otherC + initial_recruitment_TeBS + 
##     initial_recruitment_Tundra + recruitment_ten_years_BNE + 
##     recruitment_ten_years_IBS + recruitment_ten_years_otherC + 
##     recruitment_ten_years_TeBS + recruitment_ten_years_Tundra + 
##     previous_state_BNE + previous_state_IBS + previous_state_otherC + 
##     previous_state_TeBS + previous_state_Tundra + s(PC1_temp) + 
##     PC1_precip + PC1_temp:PC1_precip
## 
## Parametric coefficients:
##                                  Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                    -1.151e+01  2.746e+00  -4.192 2.76e-05 ***
## Lat                             2.687e-03  3.990e-02   0.067 0.946309    
## sand_fraction                   2.069e+01  1.098e+00  18.842  < 2e-16 ***
## silt_fraction                   1.196e+01  1.879e+00   6.367 1.92e-10 ***
## bulkdensity_soil               -2.380e+00  6.883e-01  -3.458 0.000545 ***
## ph_soil                         9.795e-02  1.655e-01   0.592 0.553973    
## soilcarbon                     -2.096e-01  6.191e-02  -3.385 0.000711 ***
## ScenarioSSP1-RCP2.6            -8.654e-01  2.135e-01  -4.054 5.04e-05 ***
## ScenarioSSP3-RCP7.0             5.844e-01  2.879e-01   2.030 0.042343 *  
## ScenarioSSP5-RCP8.5             8.045e-01  3.321e-01   2.422 0.015422 *  
## time_since_dist                 3.895e-04  5.063e-04   0.769 0.441749    
## initial_recruitment_BNE         7.321e-03  2.122e-03   3.449 0.000562 ***
## initial_recruitment_IBS         2.174e-03  3.569e-03   0.609 0.542400    
## initial_recruitment_otherC     -1.128e-02  4.112e-03  -2.743 0.006095 ** 
## initial_recruitment_TeBS        1.401e-02  1.656e-02   0.846 0.397617    
## initial_recruitment_Tundra     -4.262e-03  3.196e-03  -1.333 0.182393    
## recruitment_ten_years_BNE      -1.484e-04  1.526e-04  -0.972 0.330995    
## recruitment_ten_years_IBS       8.387e-04  5.975e-04   1.404 0.160386    
## recruitment_ten_years_otherC    2.548e-04  1.236e-03   0.206 0.836699    
## recruitment_ten_years_TeBS     -2.781e-03  2.890e-03  -0.962 0.335859    
## recruitment_ten_years_Tundra   -8.183e-05  2.067e-04  -0.396 0.692206    
## previous_state_BNE              1.224e-02  2.225e-02   0.550 0.582352    
## previous_state_IBS              1.856e-02  1.608e-02   1.154 0.248584    
## previous_state_otherC          -1.357e-01  3.382e-02  -4.011 6.04e-05 ***
## previous_state_TeBS             1.339e-01  9.962e-02   1.344 0.179056    
## previous_state_Tundra           1.151e-01  1.340e-01   0.859 0.390148    
## PC1_precip                     -4.042e-04  4.698e-05  -8.604  < 2e-16 ***
## PC1_precip:PC1_temp            -8.750e-07  1.013e-06  -0.864 0.387606    
## (Intercept).1                  -6.782e-02  1.420e+00  -0.048 0.961921    
## Lat.1                          -1.988e-02  2.065e-02  -0.962 0.335876    
## sand_fraction.1                 4.832e-01  5.663e-01   0.853 0.393497    
## silt_fraction.1                -1.794e+00  9.690e-01  -1.851 0.064176 .  
## bulkdensity_soil.1              2.117e-01  3.550e-01   0.596 0.550990    
## ph_soil.1                       2.077e-01  8.546e-02   2.430 0.015095 *  
## soilcarbon.1                    2.437e-02  3.192e-02   0.763 0.445323    
## ScenarioSSP1-RCP2.6.1           5.233e-02  1.101e-01   0.475 0.634550    
## ScenarioSSP3-RCP7.0.1           4.294e-01  1.486e-01   2.890 0.003858 ** 
## ScenarioSSP5-RCP8.5.1           7.131e-01  1.716e-01   4.156 3.24e-05 ***
## time_since_dist.1              -1.100e-04  2.613e-04  -0.421 0.673894    
## initial_recruitment_BNE.1       8.560e-05  1.095e-03   0.078 0.937700    
## initial_recruitment_IBS.1      -4.230e-03  1.841e-03  -2.298 0.021543 *  
## initial_recruitment_otherC.1   -2.440e-03  2.122e-03  -1.150 0.250177    
## initial_recruitment_TeBS.1     -7.602e-02  8.697e-03  -8.742  < 2e-16 ***
## initial_recruitment_Tundra.1    2.219e-04  1.648e-03   0.135 0.892884    
## recruitment_ten_years_BNE.1     3.091e-05  7.871e-05   0.393 0.694588    
## recruitment_ten_years_IBS.1     9.654e-04  3.082e-04   3.133 0.001732 ** 
## recruitment_ten_years_otherC.1  4.427e-04  6.379e-04   0.694 0.487715    
## recruitment_ten_years_TeBS.1    1.178e-02  1.503e-03   7.842 4.45e-15 ***
## recruitment_ten_years_Tundra.1 -1.554e-04  1.066e-04  -1.457 0.145017    
## previous_state_BNE.1           -6.702e-03  1.148e-02  -0.584 0.559482    
## previous_state_IBS.1           -2.271e-03  8.294e-03  -0.274 0.784274    
## previous_state_otherC.1        -7.876e-02  1.746e-02  -4.511 6.45e-06 ***
## previous_state_TeBS.1          -2.402e-01  5.137e-02  -4.676 2.92e-06 ***
## previous_state_Tundra.1        -1.582e-01  6.915e-02  -2.288 0.022112 *  
## PC1_precip.1                    2.078e-05  2.432e-05   0.854 0.392853    
## PC1_precip:PC1_temp.1           3.037e-06  5.249e-07   5.786 7.21e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                 edf Ref.df Chi.sq p-value    
## s(Lon)        8.236  8.843 429.48  <2e-16 ***
## s(PC1_temp)   5.430  6.642 383.44  <2e-16 ***
## s.1(Lon)      8.264  8.854  69.54  <2e-16 ***
## s.1(PC1_temp) 7.729  8.610 609.11  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Deviance explained =   66%
## -REML = 3193.5  Scale est. = 1         n = 1443

Figure 11: Smooth effects of both covariates Lon and PC1_temp for the additive model (on train data set).
Figure 11: Smooth effects of both covariates Lon and PC1_temp for the additive model (on train data set).

Figure 11 shows the residuals and Q-Q plots. Again, there is no major difference to the residuals and Q-Q plots from Figure 1.


Figure 12: Model residuals and Q-Q plots for both predicted PCs for the gam (on train data set).
Figure 12: Model residuals and Q-Q plots for both predicted PCs for the gam (on train data set).

The two plots indicating the goodness of prediction are leaved out here.


5. Check for interactions

To improve the general model fit, several interaction terms are evaluated:

  • Lon - Lat: small effects, significant for both PCs
  • sand_fraction - silt_fraction: very small effects, significant for PC 1 only
  • PC1_temp - PC1_precip: medium effects, significant for PC 2 only

These interactions result from ‘logical’ considerations, but should be double-checked with Lucia.

The model output is given by:


## Response PC1 :
## 
## Call:
## lm(formula = PC1 ~ Lon + Lat + sand_fraction + silt_fraction + 
##     bulkdensity_soil + ph_soil + soilcarbon + time_since_dist + 
##     Scenario + initial_recruitment_BNE + initial_recruitment_IBS + 
##     initial_recruitment_TeBS + initial_recruitment_Tundra + initial_recruitment_otherC + 
##     recruitment_ten_years_BNE + recruitment_ten_years_IBS + recruitment_ten_years_TeBS + 
##     recruitment_ten_years_Tundra + recruitment_ten_years_otherC + 
##     previous_state_BNE + previous_state_IBS + previous_state_TeBS + 
##     previous_state_Tundra + previous_state_otherC + PC1_temp + 
##     PC1_precip + Lon:Lat + sand_fraction:silt_fraction + PC1_temp:PC1_precip, 
##     data = d_train %>% dplyr::select(-c(clay_fraction)))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.9431  -1.9021  -0.1249   1.9098   6.6050 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  -2.910e+01  2.438e+00 -11.936  < 2e-16 ***
## Lon                           6.728e-02  1.167e-02   5.767 9.91e-09 ***
## Lat                           2.783e-01  1.659e-02  16.780  < 2e-16 ***
## sand_fraction                 2.978e+01  2.731e+00  10.907  < 2e-16 ***
## silt_fraction                 2.541e+01  4.341e+00   5.853 5.98e-09 ***
## bulkdensity_soil             -2.541e+00  7.446e-01  -3.412 0.000662 ***
## ph_soil                      -2.545e-01  1.695e-01  -1.502 0.133424    
## soilcarbon                   -2.660e-01  6.683e-02  -3.980 7.24e-05 ***
## time_since_dist               3.561e-04  5.684e-04   0.627 0.531034    
## ScenarioSSP1-RCP2.6          -2.040e+00  2.136e-01  -9.550  < 2e-16 ***
## ScenarioSSP3-RCP7.0          -1.695e+00  2.439e-01  -6.951 5.54e-12 ***
## ScenarioSSP5-RCP8.5          -1.823e+00  2.644e-01  -6.893 8.23e-12 ***
## initial_recruitment_BNE       9.675e-03  2.368e-03   4.086 4.63e-05 ***
## initial_recruitment_IBS       4.236e-03  3.994e-03   1.060 0.289101    
## initial_recruitment_TeBS      4.443e-02  1.806e-02   2.460 0.014011 *  
## initial_recruitment_Tundra   -8.874e-03  3.589e-03  -2.473 0.013532 *  
## initial_recruitment_otherC   -8.847e-03  4.622e-03  -1.914 0.055813 .  
## recruitment_ten_years_BNE    -3.474e-04  1.701e-04  -2.042 0.041314 *  
## recruitment_ten_years_IBS     1.037e-03  6.697e-04   1.548 0.121896    
## recruitment_ten_years_TeBS   -6.037e-03  3.210e-03  -1.881 0.060239 .  
## recruitment_ten_years_Tundra  3.275e-04  2.318e-04   1.413 0.157839    
## recruitment_ten_years_otherC  5.109e-04  1.389e-03   0.368 0.713060    
## previous_state_BNE            1.338e-03  2.499e-02   0.054 0.957293    
## previous_state_IBS           -6.364e-03  1.787e-02  -0.356 0.721821    
## previous_state_TeBS           1.690e-01  1.122e-01   1.506 0.132179    
## previous_state_Tundra         6.545e-03  1.502e-01   0.044 0.965260    
## previous_state_otherC        -1.781e-01  3.744e-02  -4.756 2.18e-06 ***
## PC1_temp                      6.345e-02  3.756e-03  16.896  < 2e-16 ***
## PC1_precip                   -7.359e-04  4.344e-05 -16.942  < 2e-16 ***
## Lon:Lat                      -1.289e-03  2.016e-04  -6.396 2.16e-10 ***
## sand_fraction:silt_fraction  -2.254e+01  6.880e+00  -3.276 0.001080 ** 
## PC1_temp:PC1_precip          -1.508e-06  9.693e-07  -1.556 0.120048    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.585 on 1411 degrees of freedom
## Multiple R-squared:  0.6454, Adjusted R-squared:  0.6377 
## F-statistic: 82.86 on 31 and 1411 DF,  p-value: < 2.2e-16
## 
## 
## Response PC2 :
## 
## Call:
## lm(formula = PC2 ~ Lon + Lat + sand_fraction + silt_fraction + 
##     bulkdensity_soil + ph_soil + soilcarbon + time_since_dist + 
##     Scenario + initial_recruitment_BNE + initial_recruitment_IBS + 
##     initial_recruitment_TeBS + initial_recruitment_Tundra + initial_recruitment_otherC + 
##     recruitment_ten_years_BNE + recruitment_ten_years_IBS + recruitment_ten_years_TeBS + 
##     recruitment_ten_years_Tundra + recruitment_ten_years_otherC + 
##     previous_state_BNE + previous_state_IBS + previous_state_TeBS + 
##     previous_state_Tundra + previous_state_otherC + PC1_temp + 
##     PC1_precip + Lon:Lat + sand_fraction:silt_fraction + PC1_temp:PC1_precip, 
##     data = d_train %>% dplyr::select(-c(clay_fraction)))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.0715 -0.2195  0.2604  0.6879  7.6522 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   1.161e+00  1.365e+00   0.851 0.394849    
## Lon                          -1.603e-02  6.530e-03  -2.454 0.014240 *  
## Lat                          -3.921e-02  9.284e-03  -4.223 2.56e-05 ***
## sand_fraction                 2.507e+00  1.528e+00   1.641 0.101114    
## silt_fraction                 2.476e+00  2.430e+00   1.019 0.308373    
## bulkdensity_soil             -2.159e-01  4.167e-01  -0.518 0.604432    
## ph_soil                       1.010e-01  9.486e-02   1.064 0.287364    
## soilcarbon                   -1.830e-02  3.740e-02  -0.489 0.624741    
## time_since_dist              -3.025e-04  3.181e-04  -0.951 0.341881    
## ScenarioSSP1-RCP2.6           4.042e-01  1.196e-01   3.380 0.000744 ***
## ScenarioSSP3-RCP7.0           7.273e-01  1.365e-01   5.327 1.16e-07 ***
## ScenarioSSP5-RCP8.5           6.767e-01  1.480e-01   4.572 5.26e-06 ***
## initial_recruitment_BNE       2.814e-04  1.325e-03   0.212 0.831830    
## initial_recruitment_IBS      -8.733e-03  2.236e-03  -3.906 9.82e-05 ***
## initial_recruitment_TeBS     -1.248e-01  1.011e-02 -12.343  < 2e-16 ***
## initial_recruitment_Tundra    2.025e-03  2.009e-03   1.008 0.313722    
## initial_recruitment_otherC   -2.934e-03  2.587e-03  -1.134 0.256870    
## recruitment_ten_years_BNE     8.613e-05  9.521e-05   0.905 0.365826    
## recruitment_ten_years_IBS     1.454e-03  3.749e-04   3.880 0.000109 ***
## recruitment_ten_years_TeBS    1.761e-02  1.797e-03   9.799  < 2e-16 ***
## recruitment_ten_years_Tundra -2.563e-04  1.297e-04  -1.975 0.048407 *  
## recruitment_ten_years_otherC  1.558e-04  7.774e-04   0.200 0.841174    
## previous_state_BNE           -2.542e-02  1.398e-02  -1.818 0.069332 .  
## previous_state_IBS           -3.949e-03  1.000e-02  -0.395 0.693047    
## previous_state_TeBS          -3.264e-01  6.279e-02  -5.198 2.31e-07 ***
## previous_state_Tundra        -2.139e-01  8.409e-02  -2.543 0.011094 *  
## previous_state_otherC        -8.429e-02  2.096e-02  -4.022 6.07e-05 ***
## PC1_temp                      2.366e-02  2.102e-03  11.257  < 2e-16 ***
## PC1_precip                    1.424e-05  2.431e-05   0.586 0.558165    
## Lon:Lat                       2.647e-04  1.128e-04   2.346 0.019114 *  
## sand_fraction:silt_fraction  -7.178e+00  3.851e+00  -1.864 0.062531 .  
## PC1_temp:PC1_precip          -2.693e-06  5.425e-07  -4.964 7.75e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.447 on 1411 degrees of freedom
## Multiple R-squared:  0.3668, Adjusted R-squared:  0.3528 
## F-statistic: 26.36 on 31 and 1411 DF,  p-value: < 2.2e-16

Since the residuals and the prediction accuracy hardly changes, these plots are left out here.


6. Models for single PFTs

In order to assess whether considering the PFTs separately might mitigate the structure, the initial model is fitted to FPCA scores using one PFT at a time. The following figures show the residuals for each PFT.


Figure 13: Model residuals and Q-Q plots for both predicted PCs for PFT Needleleaf evergreen (on train data set).
Figure 13: Model residuals and Q-Q plots for both predicted PCs for PFT Needleleaf evergreen (on train data set).

Figure 14: Model residuals and Q-Q plots for both predicted PCs for PFT Pioneering broadleaf (on train data set).
Figure 14: Model residuals and Q-Q plots for both predicted PCs for PFT Pioneering broadleaf (on train data set).

Figure 15: Model residuals and Q-Q plots for both predicted PCs for PFT Conifers (other) (on train data set).
Figure 15: Model residuals and Q-Q plots for both predicted PCs for PFT Conifers (other) (on train data set).

Figure 16: Model residuals and Q-Q plots for both predicted PCs for PFT Temperate broadleaf (on train data set).
Figure 16: Model residuals and Q-Q plots for both predicted PCs for PFT Temperate broadleaf (on train data set).

Figure 17: Model residuals and Q-Q plots for both predicted PCs for PFT Tundra (on train data set).
Figure 17: Model residuals and Q-Q plots for both predicted PCs for PFT Tundra (on train data set).

Interestingly, the general residual shape in Figure 1 resembles strongly to that of both Needleleaf evergreen and Pioneering broadleaf in Figures 13 and 14, respectively. However, the residuals are bounded for all five PFTs, but they differ strongly in the shape of the structure.

7. Models for the original data

One possible explanation for the structured residuals is that the model is not compatible with the MFPCA scores. In order to investigate this further, the model is fit to the original data. Clearly, this is not possible for the whole data set, but the model can be fed with shares of specific time points and specific PFTs. Since the analysis above indicated a strong influence of Needleleaf evergreen (BNE), in the following, this PFT is considered for different time points. Figure 18 shows the residuals for five different time points, namely after 20, 40, 60, 80 and 100 years of recovery.


Figure 18: Model residuals and Q-Q plots for five different time points and PFT Needleleaf evergreen (on train data set).
Figure 18: Model residuals and Q-Q plots for five different time points and PFT Needleleaf evergreen (on train data set).

Apparently, the longer the recovery period, the more this structure becomes apparent. So the thought of MFPCA scores are not compatible with the model is not confirmed.

This leads to the question: If it is not the target data, nor non-linear or interaction effects, and not because of the FPCA scores for temperature and precipitation - what else could be the reason for the structured residuals?

Conclusion

After the attempts above and even more non-shown here to mitigate the structure in the residuals, the question, why the residuals show this structure, remains. However, I can think of two possible reasons:

  • The data is simulated and therefore interdependent. Maybe this first model attempt unfolds these dependencies which results in the observed structures?
  • Maybe the model in general is not appropriate and other techniques for functional data should be applied to that kind of data?

Appendix

A.1 AIC values for all presented approaches

The following table shows AIC values for all presented approaches. Clearly, the additive model wins in terms of lowest values, while the model fitted to the absolute `cmass’ values has twice the AIC values.


Initial model Stepwise selection Softmax Absolute cmass Additive model Interactions
AIC 12103 12099 10251 24927 8382 12020

A.2 Further Residual Plots

Here, each covariate is plotted against the residuals for the first model:


Figure 19: Residuals for location covariates (Lon and Lat; on train data set).
Figure 19: Residuals for location covariates (Lon and Lat; on train data set).

Figure 20: Residuals for soil composition (on train data set).
Figure 20: Residuals for soil composition (on train data set).

Figure 21: Residuals for soil attributes (on train data set).
Figure 21: Residuals for soil attributes (on train data set).

Figure 22: Residuals for initial recruitment (on train data set).
Figure 22: Residuals for initial recruitment (on train data set).

Figure 23: Residuals for recruitment after ten years (on train data set).
Figure 23: Residuals for recruitment after ten years (on train data set).

Figure 24: Residuals for previous vegetation (on train data set).
Figure 24: Residuals for previous vegetation (on train data set).

Figure 25: Residuals for climate FPCA scores (on train data set).
Figure 25: Residuals for climate FPCA scores (on train data set).

A.3 First two PCs of target MFPCA for interpretation


Figure 26: First PC of target variable: BNE, IBS, otherC, TeBS, Tundra.
Figure 26: First PC of target variable: BNE, IBS, otherC, TeBS, Tundra.

Figure 27: Second PC of target variable: BNE, IBS, otherC, TeBS, Tundra.
Figure 27: Second PC of target variable: BNE, IBS, otherC, TeBS, Tundra.

A.4 Spatial distribution of residuals


Figure 28: Spatial distribution of residuals corresponding to the first PC.
Figure 28: Spatial distribution of residuals corresponding to the first PC.

Figure 29: Spatial distribution of residuals corresponding to the second PC.
Figure 29: Spatial distribution of residuals corresponding to the second PC.